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Abstract 

We give a new bound on the sum of the linear Fourier coefficients of a Boolean function in 
terms of its parity decision tree complexity. This result generalizes an inequality of O’Donnell 
and Servedio for regular decision trees [OS08]. We use this bound to obtain the first non-trivial 
lower bound on the parity decision tree complexity of the recursive majority function. 


1 Introduction 

In this note, we explore connections between two different notions of complexity of Boolean functions 
/ : {—1, !}”■ {~1) 1}: its decision tree complexity, and the sum of its linear Fourier coefficients. 

Decision trees are full binary trees with internal nodes labelled by variables Xi for some i E [n] 
and with leaves labelled with constants i E { — 1, !}• A decision tree D is said to compute / if the 
path from the root to a leaf in D defined by x leads to a leaf labelled by f{x) for every x E {—1,1}"". 
The depth of a decision tree is the maximum number of internal nodes along any root-to-leaf path, 
and the decision tree (depth) complexity of a function / is the minimum depth of any decision tree 
D that computes /. 

Every Boolean function / : { — 1,1}” { — 1,1} has a unique representation as a multilinear 

polynomial 

fix) = X fiS)xsix) 

SC[n] 

where Xsix) ■= YliesXi and the numbers f{S) = E [f{x)xsix)] E [—1,1] are the Fourier coef¬ 
ficients of /. The Eourier coefficients corresponding to singleton sets S = {i}, i E [n] are called 
linear Fourier coefficients. Eor notational clarity, we will write f{i) to denote the linear Eourier 
coefficient f{{i}). As mentioned above, we consider the measure of complexity of / determined by 
the sum ^11=1 /(O of Us linear Eourier coefficients. 
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In [OS08] O’Donnell and Servedio established a connection between these two measures of 
complexity by establishing the following inequality on the linear Fourier coefficients of a Boolean 
function computed by a depth-d decision tree: 

O’Donnell—Servedio Inequality. Let f : { — 1, Ij” 1,1) be computable by a decision tree of 

depth d. Then /(*) < Vd. 

In addition to being a natural statement relating a combinatorial notion of complexity (decision 
tree complexity) to an analytic one (the sum of linear Fourier coefficients), this inequality is also 
the crux of the main algorithmic result of [OS08], the hrst algorithm for PAG learning the class of 
monotone functions to high accuracy from uniformly random labelled examples, running in time 
polynomial in a reasonable complexity measure of the target function (in this case, its decision 
tree complexity). To date this remains our best progress towards the goal of efficiently learning 
monotone polynomial-sized DNFs, a longstanding open problem in PAG learning [Blu03]. 

1.1 Our main result 

Another notion of complexity of Boolean functions related to decision trees is their parity decision 
tree complexity. Parity decision trees (PDTs) are generalizations of decision trees where internal 
nodes are now labelled by subsets S C [re] instead of indices i G [re], and the edge taken from an 
internal node is determined by the parity 0 jg 5 Xi of the input (instead of the value of the single 
value Xi in the case of regular decision trees). The parity decision tree (depth) complexity of a 
function is the minimum depth of a parity decision tree that computes /. 

Geometrically, parity decision trees correspond to partitions of the hypercube { — I,!}” into 
ajfine subspaces, whereas regular decision trees partition the same hypercube into subcubes. The 
PDT model of computation has received signihcant attention in recent years [MO09, ZS09, Shall, 
BSK12, TWXZ13, GT14, STV14, OST+14], and in particular, there has been much interest in 
generalizing results that apply to normal decision trees to the more general setting of PDTs (see 
e.g. the survey [ZSIO]). 

The parity decision tree complexity of a Boolean function / can be much smaller than its regular 
decision tree complexity. The parity function over re variables, which can be computed by a trivial 
parity decision tree of depth 1 but requires regular decision tree depth re, gives the largest possible 
separation between the two complexity measures. As a result, many inequalities related to the 
decision tree complexity do not necessarily hold with respect to parity decision tree complexity. In 
particular, the O’Donneh-Servedio inequality does not imply that any similar inequality must hold 
between the sum of linear Fourier coefficients of a Boolean function and its parity decision tree 
complexity. Our main result shows that, nevertheless, such a generalization does hold. 

Theorem 1. Let f : {—1,1}"' — >■ { — 1,1} be computable by a parity decision tree of depth d. Define 

= 4Pr[/(x) = 1] Pr[/(x) = -1] to be the variance of f. Then 

n 

^/(i) < V4\n.2a‘^d} 

i=l 

The main technical component in the proof of Theorem 1 is a fundamental inequality (presented 
in Lemma 3.1) concerning small-depth parity decision trees. One notable aspect about the proof of 
this inequality is that it is hrst established for a subclass of parity decision trees called correlation- 
free parity decision trees. We then show that every parity decision tree of depth d can be rehned 

^This result was originally circulated in an unpublished manuscript titled Discrete isoperimetry via the entropy 
method (2013). 
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to obtain a correlation-free parity decision tree of depth at most 2d to obtain Lemma 3.1. See 
Section 3.1 for the details. 

We complete the proof of Theorem 1 using an information-theoretic argument. While the proof 
can also be completed using analytical arguments and Jensen’s inequality, the information-theoretic 
argument appears to be required to obtain the sharp bounds in our theorem statement. This same 
argument can also be used in the regular decision tree model to sharpen the O’Donnell-Servedio 
theorem directly as well. 

1.2 Application: Recursive majority function 

We use Theorem 1 to obtain the first non-trivial lower bound on the parity decision tree complexity 
of the recursive majority function. The 3-majority function is the function MAJ 3 : {—1,1}^ — 
{— 1 , 1 } defined by MAJ 3 (a:) = (|_iji[^i+^ 2 -i-a: 3 <o]_ every k>2, the recursive majority function 
MAJf^ : {— 1 , l}^*" 1 } is defined by setting 

MAjf (X) = MAJ 3 (MAjf-l(xp,..., 3 .-i}),MAjf-l(X{ 3 .-i+i,..., 2 . 3 .-l}),MAjf-Hx{ 2 . 3 .-l+l,..., 3 .})) . 

The recursive majority function was introduced by Boppana [SW 86 ] to determine possible gaps 
between the deterministic and randomized decision tree complexity of Boolean functions. It is easy 
to verify that the deterministic decision tree complexity of MAJ®^ is 3^. By contrast, the problem 
of determining the randomized decision tree complexity of MAJ®^ is much more challenging: fol¬ 
lowing a sequence of works on this question [SW 86 , JKS03, MNSXll, Leol3, MNS+13], Magniez 
et al. [MNS“^13] have shown that the minimal depth i?(MAJ®^) of any randomized decision tree 
that computes the MAJ®^ function satisfies 

0(2.57143^) < 1?(MAJ®^) < 0(2.64944*’) 

but the exact randomized query complexity of the recursive majority function is still unknown. 

A closely related problem that naturally arises when considering the recursive majority function 
is to determine its (deterministic) parity decision tree complexity. A standard adversary argument 
can be used to show that every parity decision tree that computes the recursive majority function 
has depth at least 2*’. Using Theorem 1, we obtain the first lower bound on the parity decision tree 
complexity of the recursive majority function that improves on this trivial lower bound. 

Theorem 2. Every parity decision tree that computes MAJ®*’ has depth 11(2.25*’). 

The proof of Theorem 2 is established by computing the linear Fourier coefficients of the MAJ 3 
function directly, using a fundamental identity on the linear Fourier coefficients of function powers 
(see Fact 2.7) to determine the linear Fourier coefficients of the MAJ®*’ function, and applying the 
inequality in Theorem 1. This approach is quite general, and may be useful for obtaining lower 
bounds on the parity decision tree complexity of other Boolean functions in the future as well. 

2 Preliminaries 

2.1 Information theory 

All probabilities and expectations are with respect to the uniform distribution unless otherwise 
stated. We use boldface letters (e.g. X, x) to denote random variables. The proof of Theorem 1 
uses elementary definitions and inequalities from information theory. A more thorough introduction 
to these tools can be found in [CT91]. 
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Definition 2.1. The entropy of the random variable X drawn from the finite sample space 0 
according to the probability mass fnnction p : —>■ [0,1] is H{X.) = — logp(x). The 

conditional entropy of X given Y when they are drawn from the joint probability distribntion 
p : X O' ^ [0,1] is F(X I Y) = -Y,^^^y^^,p{x,y)\og{p{y)/p{x,y)). 

Definition 2.2. The binary entropy function is the function /i : [0,1] —>• IR defined by h{t) = 
—t log 2 (t) — (l — t) log 2 (l — t). The value h{t) represents the entropy of a random variable X drawn 
from { — 1,1} with Pr[X = 1] = t. 

Fact 2.3 (Data processing inequality). //X and Z are conditionally independent given Y, then 
H{X I Z) > F(X I Y). 

Fact 2.4 (Bounds on the binary entropy function). The binary entropy function h : [0,1] —?■ IR is 
bounded above and below by 1 — t^ < /i(^ + |) < 1 — 21 ^ 2 - 

2.2 Fourier analysis and function composition 

We assume that the reader is familiar with the Fourier analysis of Boolean functions. For a complete 
introduction to the topic, see [0’D14]. 

Definition 2.5. The composition of / : {—1,1}™’ {—1,1} and g : {—1,1}'^ {—1,1} is the 

function fog: { — 1 , 1 }™’’^ { — 1 } where 

(/ O 9) (x) = f {g{xi , . . . , Xn), . . . , gix(^rn-l)n+l ,■■■, Xmn)) ■ 

For A: > 1, the kth power of / : { — 1,1}” —)> { — 1,1} is the function f®^ : { — 1,1}”* { — 1,1} 

defined recursively by setting = / and = f o 

Remark 2.6. As we can verify directly, the recursive majority function MAJ®^ is the kth. power 
of the MAJ 3 function. 

We use the following fact on the linear Fourier coefficients of composed functions. (See Ap¬ 
pendix A for a proof of this fact.) 

Fact 2.7. For any f : {—1,1}”* ^ {—1,1} and any balanced function g : {—1,1}” ^ {—1,1}, 

f° 9{k) = j X] ) ( X] 9ij) I • 

fcG[mn] \*GN / ViGl™] / 


2.3 Parity decision trees 

As mentioned in the introduction, a parity decision tree is a rooted full binary tree where each 
internal node is associated with a set S' C [n], the two edges leading to the children of a node 
are labelled with —1 and 1, respectively, and each leaf is associated with a value in {—1,1}. Each 
input X G { — 1,1}” defines a path to a unique leaf in a parity decision tree T by following the edge 
labelled with Xsix) from a node labelled with S. We say that the tree T computes the Boolean 
function / : {—1,1}” ^ {~1) 1} if each input x defines a path in T to a leaf labelled with f{x). 
When T computes / and £ is a leaf of T, we write f{i) to denote the label of i. 

We can represent each leaf of a parity decision tree T with a vector i G {—1,0,1}” where ii is 
the expected value of the coordinate Xi over the uniform distribution of all inputs x G {—1,1}” that 
define a path to the leaf i in T. We let leafr : {—1,1}” —>■ {—1, 0,1}” be the function that returns 
the vector representation of the leaf reached by the path defined in T for every input x G {—1, !}”■ 
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3 Proof of Theorem 1 


The main technical component of the proof of Theorem 1 is the following inequality. 

Lemma 3.1. For any parity decision tree T of depth d, 

We now complete the proof of Theorem 1 assuming Lemma 3.1. The proof of the lemma then 
follows in the next subsection. 


Theorem 1 (Restated). Let f : { — 1,1}"' ^ {—1,1} be computable by a parity decision tree T of 
depth d. Define cr^ = 4Pr[/(x) = l]Pr[/(x) = —1] to be the variance of f. Then 

n 

f{i) < Vlln 2 a'^d. 

i=l 

Proof. Draw X G { — I, !}"■ and i G [n] independently and uniformly at random. Let us first 
compute the conditional entropy i^(Xj | /(X)). Write p = Pr[/(X) = 1]. Then 

rri- 11 fri II E[(ra)(i±m)] 1 /(i) 

PrlX. - 1 I /(X) - 1] - p,|j,,x) = 1 ] - i + ly 


and so 


Similarly, 


2=1 


X,i 


Pr[Xi = 1 I /(X) = -1] = - - V 

x,i^ ' 2 ^4(l-/rn 

2=1 ^ ^ 


By the definition of conditional entropy and the upper bound in Fact 2.4, 


n J"/ .X n 

H(x, I /(x)) = + E + <1 - ^ 


/(O 


<h\ 1 - 


^ 4/in/ 
1 


^ 4(1 - p)n 
1 


2 In 2 \ 2 pn 
2 


+ (1 “ 1 “ 


^ E ./(0 

2 In 2 I 2(1 - p)n 


= 1 - 




( 1 ) 


81n2/i(l — p)n ^' 

Since the leaf reached in T by an input x determines /(x), the data processing inequality implies 

H{Xi I /(X)) > H{Xi I leafT(X)). (2) 


that: 

We also have that 


HiXi I leafr(X)) = 


' 2 2 n 


where the expectation is over the distribution defined by the relative mass of each leaf in T. 
Applying the lower bound in Fact 2.4, we get 


H{Xi I leafy (X)) > 1 - 


2 n 


(3) 
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Combining (l)-(3), we obtain 


(j;/«)'<21n2.4/x(l-/.)E 


(E^.) 


2 


and the theorem follows from the bound in Lemma 3.1. □ 

Remark 3.2. A result that is similar to Theorem 1, but with a slightly weaker bound, can also be 
obtained directly from Lemma 3.1 and Jensen’s inequality. This approach gives the weaker bound 
Ym=i /(*) E \/M. See Appendix B for the details. 


3.1 Proof of Lemma 3.1 

The proof of Lemma 3.1 has three main components. The first is a proof of the lemma for a class 
of parity decision trees that we call (pairwise) correlation-free. 

Definition 3.3. The parity decision tree T is (pairwise) correlation-free when for every i j £ [n] 

and any path in the tree T, if Xi © Xj is fixed by the queries in the path, then so are Xi and Xj. 

Proposition 3.4. Let T be a correlation-free parity decision tree of depth d. Then < d. 

Proof. Consider any node v in the parity decision tree that fixes the parity Xi © Xj. Since T 

is correlation-free, every leaf below v satisfies £i,£j / 0. In particular, Pr = — 1] = 
Pr = 1] = 1/2 so = 0. And every path that reaches a leaf without fixing Xi © Xj 

does not set both Xi and Xj, so such a leaf i satisfies iiij = 0. This means that for every i ^ j, 
E £i£j = 0 and so 

i i ij^j 

where the final inequality uses the fact that at most d coordinates can be fixed by the queries of 
any path in T. □ 

We want to use Proposition 3.4 by showing that we can refine every parity decision tree into 
an uncorrelated parity decision tree without increasing its depth by too much. The following 
proposition formalizes this statement. 

Proposition 3.5. Let T be a parity decision tree of depth d. Then there is a refinement T' of T 
whieh is an uncorrelated parity decision tree of depth at most 2d. 

Proof. For each leaf of T, let J be a set of disjoint pairs (z,j) of coordinates such that x* © Xj is 
fixed but neither Xi nor Xj have been fixed by the queries down the path to the leaf. Refine T by 
querying the first coordinate in each such pair. Once we have done this refinement at every leaf, 
the resulting tree is uncorrelated. To complete the proof of the proposition, it remains to show 
that at most 2d disjoint pairs of correlated coordinates can occur in any path on the tree T. 

Let V be the subspace of {0,1}" spanned by the (at most) d queries down any fixed path in T. 
Let S' be a maximal linearly independent subset of V containing only vectors of Hamming weight 
1 or 2. Since H is a d-dimensional subspace, |S| < d. Let J be the set of coordinates that are set 
to 1 in at least one vector in S. Then | J| < 2d. Furthermore, if i is fixed or correlated, there exists 
a vector v of Hamming weight at most 2 in H for which Uj = 1. This means that either v £ S or v 
is a linear combination of some vectors in S; either case implies that i £ J. □ 
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The third and final component of our proof of the lemma is a simple argument showing that 
refining a decision tree can only increase the value of 

Proposition 3.6. Let T' be any refinement of the parity decision tree T. Then 


E 

l&T 



< E 

e&T' 



i=l 


Proof. It suffices to establish the proof in the case where T' replaces one leaf of T with an extra 
node. Let v be the leaf in T that we replace with the node with leaves u, w. Let p be the probability 
that a random input x reaches the leaf v in T. Then 


E 

I'eT' 



i=l 


E 

i&T 


TL / 7X Tl 

= P- + 



Let S C [n] be the set of coordinates that are fixed by the query at the node that replaced v. Then 
Vi = 0 for each i G S, and 6 := ~ Write 7 = Yli Then 


1 

2 


71, [y TL Q 7X 

(E-0 -(E-0 




□ 


We can now complete the proof of the lemma. 

Proof of Lemma 3.1 . Let T' be the uncorrelated parity decision tree of depth at most 2d obtained 
by refining T, as promised by Proposition 3.5. By Propositions 3.6 and 3.4, 


E 

ter 



2=1 


< E 

e'eT' 


X 2 


□ 


Remark 3.7. The same arguments in the proof of Lemma 3.1 can also be sharpened to show that 
the expression E(^j is bounded above by 2 times the average depth of the parity decision tree 
T. 


Remark 3.8. When T is a standard decision tree, (4) directly implies that E(^^ < d. It is 

natural to ask whether Lemma 3.1 can be sharpened to obtain the same bound for parity decision 
trees as well. It cannot: consider the MAJ 3 : { — 1,1}^ —)• { — 1,1} function, which returns the sign 
of xi + X 2 + X 3 . One parity decision tree that computes MAJ 3 queries xiX 2 at the root and then 
queries xi if X 1 X 2 = 1, or X 3 otherwise. This tree has depth 2 but = | > 2. 


4 The recursive majority function 

Let us now see how Theorem 1 yields a lower bound on the parity decision tree complexity of the 
recursive majority function. 

Theorem 2 (Restated). Every parity decision tree that computes MAJ®^ has depth 0(2.25*'). 
Proof. By direct calculation, we observe that the Fourier expansion of the MAJ 3 function is 

/ X 1 1 1 1 

MAJ3(xi,X2,X3) = -Xi + -X 2 + -X 3 - -X 1 X 2 X 3 . 
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By Fact 2.7, for every A; > 1 we have 


*G[3'=] \iG[3] / \ie[3'=-i] y \ie[3''-l] J 

By induction, this identity yields 

^ MAjf'=(i)= (2)‘. 

ie[3'=] ^ ^ 

Let d be the minimal depth of any parity decision tree that computes By Theorem 1, we 

have (|)^ < V41n2d and so d > = 0(2.25^). □ 

5 Conclusion and open problem 

We have shown that the O’DonnelBServedio inequality generalizes to the setting of parity decision 
trees. A related conjecture of Parikshit Gopalan and Rocco Servedio posits that the O’DonnelR 
Servedio inequality can also be generalized in a different direction as well, to the setting of Boolean 
functions with low Fourier degree, where the Fourier degree of a Boolean function / is the size of 
the largest set S such that f{S) / 0. 

Gopalan—Servedio Conjecture [0’D12]. Let f : {—1,1}" {—1,1} be a Boolean function 

with Fourier degree d. Then ^^^=1 fi'^) — 0{Vd). 

While the Gopalan-Servedio conjecture and Theorem 1 both generalize the O’Donnell-Servedio 
inequality (as Fourier degree and parity decision tree depth are both upper bounded by regular 
decision tree depth), they are incomparable to each other — the n-variable parity function has PDT 
depth 1 and Fourier degree n, and conversely there are functions whose PDT depth is polynomially 
larger than its Fourier degree [OST"*“14]. 
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A Mulitiplicativity of the level-1 Fourier mass 

Fact 2.7 is a direct consequence of the following identity. 

Proposition A.l. For any function f : {—1, 1}™ ^ 1}> o,ny balanced function g : { — 1, 

{ — 1,1}, and any i E [m] and j E [n], 

f o g[{i - l)n + j) = f{i)g{j). 

Proof. By definition, the Fourier expansion of /, and linearity of expectation, 
fog{{i - l)n + j) = Ej, [f{g{xi,.. .,Xn), ■ ■ ■ ,5(a^(m-l)n+l, • • ■,Xmn)) ■ 


SC[n] 


l)n+l! • • • ) ^fcn) ■ ^(i—l)n+jr 


lk£S 


(5) 


When i ^ S, 


E, 


l)n+l) • • • ) Xkn) ■ 


Ikes 


= E, 


l)n+l) ■ ■ ■ ) Xkri) 


Ikes 


[^(i—l)n+jr] O' 


Similarly, when S \ {z} ^ 0, we can hx any i G S \ {i} and observe that 


E, 


n 9i ®(A:—l)n+l) ■ ■ ■ ) ®fcn) ' ^(i—l)n+j] 


Ikes 


— l)n+l) • • • )2^Z!n)] ■ Ej; 


l)n+l) • ■ ■ iXkn) ■ 2^(i—l)n+j 

keS\{t} 


When g is balanced, Ea, \g{x(^i_i-)n+i-, ■ ■ ■ ,Xin)] = 0 so the only non-zero term of the sum in (5) is 
the one where S = {i} and 

fog[{i - 1) -b j) = /(i)Ea;[5((x(j_i)„+i, . . . ,Xin)xi^i_i^n+j] 

= f{i)'E,x[g{xi,...,Xn)Xj] 

= mdu)- □ 


B Coarser bounds 

We can obtain a weaker version of Theorem 1 by combining Lemma 3.1 with the following easy 
inequality which is essentially equivalent to Lemma 3 in [OS08]. 

Lemma B.l (O’Donnell and Servedio [OS08]). Let f : {—1,1}” {—1,1} be computable by a 

parity decision T. Then 

n 

/(*) < 

i=l 

Proof. The linear Fourier coefficients of / satisfy 

f{i) = E^[f{x)xi] = Bi^T'Ea;:t{x)=e[fix)Xi] = [fi^)'Ex:t{x)=i[Xi]] = E£gr[/(^)^i]■ 

So E. m = ^eeTim < EteT[| □ 


E«< 
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We are now ready to complete the proof of the slightly weaker version of Theorem 1. 


Theorem 3. Let f : {—1,1}*^ —{—1,1} be computable by a parity decision tree of depth d. Define 
^2 = 4Pr[/(x) = 1] Pr[/(x) = -1] to be the variance of f. Then 

n 

^f{i) < V^. 

i=l 

Proof. By Lemma B.l and Jensen’s inequality, 


/ n ^ 

2 

n 

2 

' / n \ 2 - 

[Em 

< EfgT 


< 


\i=i y 


i=l 


Vi=l / 


Theorem 1 then follows directly from Lemma 3.1. □ 


11 








